Exposing ambiguities in a relation-extraction gold standard with crowdsourcing

نویسندگان

  • Tong Shu Li
  • Benjamin M. Good
  • Andrew I. Su
چکیده

Semantic relation extraction is one of the frontiers of biomedical natural language processing research. Gold standards are key tools for advancing this research. It is challenging to generate these standards because of the high cost of expert time and the difficulty in establishing agreement between annotators. We implemented and evaluated a microtask crowdsourcing approach that can produce a gold standard for extracting drug-disease relations. The aggregated crowd judgment agreed with expert annotations from a pre-existing corpus on 43 of 60 sentences tested. The levels of crowd agreement varied in a similar manner to the levels of agreement among the original expert annotators. This work reinforces the power of crowdsourcing in the process of assembling gold standards for relation extraction. Further, it highlights the importance of exposing the levels of agreement between human annotators, expert or crowd, in gold standard corpora as these are reproducible signals indicating ambiguities in the data or in the annotation guidelines.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Crowdsourcing the Extraction of Data Practices from Privacy Policies

Website and mobile application privacy policies are intended to describe the system’s data practices. However, they are often written in non-standard formats and contain ambiguities that make it difficult for users to read and comprehend these documents. We propose a crowdsourcing approach to extract data practices from privacy policies to provide more concise and useable privacy notices to use...

متن کامل

Perform Three Data Mining Tasks with Crowdsourcing Process

For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...

متن کامل

Crowd Truth: Harnessing disagreement in crowdsourcing a relation extraction gold standard

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission ...

متن کامل

University of Glasgow (qirdcsuog) at TREC Crowdsourcing 2011: TurkRank-Network-based Worker Ranking in Crowdsourcing

For TREC Crowdsourcing 2011 (Stage 2) we propose a networkbased approach for assigning an indicative measure of worker trustworthiness in crowdsourced labelling tasks. Workers, the gold standard and worker/gold standard agreements are modelled as a network. For the purpose of worker trustworthiness assignment, a variant of the PageRank algorithm, named TurkRank, is used to adaptively combine ev...

متن کامل

Self-Crowdsourcing Training for Relation Extraction

One expensive step when defining crowdsourcing tasks is to define the examples and control questions for instructing the crowd workers. In this paper, we introduce a self-training strategy for crowdsourcing. The main idea is to use an automatic classifier, trained on weakly supervised data, to select examples associated with high confidence. These are used by our automatic agent to explain the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1505.06256  شماره 

صفحات  -

تاریخ انتشار 2015